Very fast pattern matching for highly repetitive text

نویسنده

  • David Kulp
چکیده

This paper describes two searching methods for locating longest string matches in source texts of low entropy. A modi cation of the Boyer-Moore scanning algorithm and a statistical method, which searches for less likely symbols, are presented. Both algorithms have been implemented as part of the searching strategy for an LZ77 type encoder. Experimental results are included.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A compressed dynamic self-index for highly repetitive text collections

We present a novel compressed dynamic self-index for highly repetitive text collections. Signature encoding, an existing self-index of this type, has a large disadvantage of slow pattern search for short patterns. We obtain faster pattern search by leveraging the idea behind a truncated suffix tree (TST) to develop the first compressed dynamic self-index, called the TST-index, that supports not...

متن کامل

Pattern Matching Machine for Text Compressed Using Finite State Model

The classical pattern matching problem is to nd all occurrences of patterns in a text. In many practical cases, since the text is very large and stored in the secondary storage, most of the time for the pattern matching is dominated by data transmission of the text. Therefore the text compression can speed-up the pattern matching. In this framework it is required to develop an e cient pattern m...

متن کامل

Byte pair encoding : a text compression scheme that accelerates pattern matching

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring ...

متن کامل

Speeding Up Pattern Matching by Text Compression

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring ...

متن کامل

DEPARTMENT OF COMPUTER SCIENCE Fast Text Searching With Errors

Searching for a pattern in a text file is a very common operation in many applications ranging from text editors and databases to applications in molecular biology. In many instances the pattern does not appear in the text exactly. Errors in the text or in the query can result from misspelling or from experimental errors (e.g., when the text is a DNA sequence). The use of such approximate patte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992